postgres.email

further improving roles_is_member_of()

  • Jump to comment-1
    Nathan Bossart<nathandbossart@gmail.com>
    Apr 12, 2024, 4:16 AM UTC
    (moved to a new thread)
    On Thu, Mar 21, 2024 at 04:31:45PM -0400, Tom Lane wrote:
    I wrote:
    ... I still see the problematic GRANT taking ~250ms, compared
    to 5ms in v15. roles_is_member_of is clearly on the hook for that.

    Ah: looks like that is mainly the fault of the list_append_unique_oid
    calls in roles_is_member_of. That's also an O(N^2) cost of course,
    though with a much smaller constant factor.

    I don't think we have any really cheap way to de-duplicate the role
    OIDs, especially seeing that it has to be done on-the-fly within the
    collection loop, and the order of roles_list is at least potentially
    interesting. Not sure how to make further progress without a lot of
    work.
    I looked at this one again because I suspect these "thousands of roles"
    cases are going to continue to appear. Specifically, I tried to convert
    the cached roles lists to hash tables to avoid the list_member_oid linear
    searches. That actually was pretty easy to do. The most complicated part
    is juggling a couple of lists to keep track of the roles we need to iterate
    over in roles_is_member_of().
    AFAICT the only catch is that select_best_grantor() seems to depend on the
    ordering of roles_list so that it prefers a "closer" role. To deal with
    that, I added a "depth" field to the entry struct that can be used as a
    tiebreaker. I'm not certain that this is good enough, though.
    As shown in the attached work-in-progress patch, this actually ends up
    removing more code than it adds, and it seems to provide similar results to
    HEAD (using the benchmark from the previous thread [0]). I intend to test
    it with many more roles to see if it's better in more extreme cases.
    [0] https://postgr.es/m/341186.1711037256%40sss.pgh.pa.us
    --
    Nathan Bossart
    Amazon Web Services: https://aws.amazon.com
    • Jump to comment-1
      Nathan Bossart<nathandbossart@gmail.com>
      Apr 12, 2024, 7:19 PM UTC
      On Thu, Apr 11, 2024 at 11:16:33PM -0500, Nathan Bossart wrote:
      As shown in the attached work-in-progress patch, this actually ends up
      removing more code than it adds, and it seems to provide similar results to
      HEAD (using the benchmark from the previous thread [0]). I intend to test
      it with many more roles to see if it's better in more extreme cases.
      Even with 100K roles, the Bloom filter added in commit d365ae7 seems to do
      a pretty good job at keeping up with the hash table approach. The callers
      of roles_is_member_of() that do list_member_oid() on the returned list
      might benefit a little from a hash table, but I'm skeptical it makes much
      difference in practice. This was an interesting test, but I'll likely
      withdraw this patch shortly.
      --
      Nathan Bossart
      Amazon Web Services: https://aws.amazon.com